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Abstract 

Given a random sample from some unknown density /□ : K. — !■ [0, oo) we devise 
Haar wavelet estimators for /o with variable resolution levels constructed from lo- 
calised test procedures (as in Lepski, Mammen, and Spokoiny (1997, Ann. Statist.)). 
We show that these estimators satisfy an oracle inequality that adapts to hetero- 
geneous smoothness of /o, simultaneously for every point a; in a fixed interval, in 
sup-norm loss. The thresholding constants involved in the test procedures can be 
chosen in practice under the idealised assumption that the true density is locally 
constant in a neighborhood of the point x of estimation, and an information theoretic 
justification of this practise is given. 

1 Introduction 

One of the most enduring challenges in statistical function estimation is to devise pro- 
cedures that adapt to the locally variable complexity of the unknown function. For 
example, if one observes a random sample Xi,...,Xn with density /o : M — )• M, then 
/o may exhibit spatially inhomogeneous smoothness: The density could be infinitely- 
differentiable on most of its support except for a few points Xm where it behaves locally 



'Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of 
Cambridge, CB30WB, Cambridge, UK. Email: r.nickl@statslab.cam.ac.uk 

^Weierstrass Institute for Applied Analysis and Stochastics, Mohrenstrasse 39, 10117 Berlin, Ger- 
many. Email: spokoiny@wias-berlin.de. The author is partially supported by Laboratory for Structural 
Methods of Data Analysis in Predictive Modeling, MIPT, RF government grant, ag. 11.G34.31.0073. Fi- 
nancial support by the German Research Foundation (DFG) through the Collaborative Research Center 
649 "Economic Risk" is also gratefully acknowledged. 



1 



like |x — x. 



for some distinct numbers a. 



The location of the irregular points 



Xra will usually not be known, and neither the corresponding degree of smoothness am- 
Moreover /o could possess a so-called multifractal behavior, changing its Holder expo- 
nents continuously on its domain of definition - in fact, as shown in Jaffard [llj, 'typical' 
functions in the Besov spaces usually considered in nonparametric statistics are always 
multifractal. Donoho and Johnstone [Ij and Donoho, Johnstone, Kerkyacharian, and 
Picard [2], [3] have suggested that methods based on wavelet shrinkage can, to a certain 
extent, adapt to spatially inhomogeneous complexity of the unknown function /q. More- 
over, Lepski, Mammen, and Spokoiny |12j showed that this is not intrinsic to wavelet 
methods, and that similar spatial adaptation results can be proved for kernel methods 
based on locally variable bandwidth choices. 

There are several ways in which one can measure spatial adaptivity of an estimator. 
A minimal requirement may be to devise a rule fn{x) that estimates fo{x) in an optimal 
way at every point x, and the methods suggested in [Ij and [12] meet this requirement. 
These procedures depend on the point x, and the natural question arises as to how 
a given procedure performs globally as an estimator for /q. To address this question, 
Donoho et al. [3j and Lepski et al. [12] considered global L^'-loss, r < c«, and argued 
that taking L''-loss over Besov-bodies B{s,p,q) where smoothness is measured in L^, 
r > p, gives a way to assess the spatial performance of an estimator. A probably more 
transparent approach to the spatial adaptation problem is to consider sup-norm loss for 
estimators with locally variable bandwidths: one aims to find an estimator fn{x) that is 
locally optimal for estimating foix), and simultaneously so for all x. This approach was 
not considered in the literature so far - the results [6], [7], [8], [9] address the spatially 
homogeneous setting only. 

A first contribution of this article is to show that a dyadic histogram estimator with 
variable bin size spatially adapts to possibly inhomogeneous local Holder smoothness 
of /o, in global sup-norm loss. More precisely, for K{x,y) the Haar wavelet projection 
kernel, we shall construct 



where jnix) is a variable resolution level that depends both on x and the sample, and 
show that the random variable 



fn{x) 
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is stochastically bounded, where r(n, x, /o) is the optimal risk of an 'oracle estimator' for 
/o at the point x. We show moreover that this rate equals the pointwise minimax rate of 
adaptive estimation for /o(x) at every x, and that spatial adaptation occurs uniformly in 
X except near discontinuities of the Holder exponent function t{f,x), see after Theorem 
[3] for a detailed discussion. 

While this result shows that spatial adaptation is indeed possible in a strong theo- 
retical way, a drawback shared by most results in the literature on adaptive estimation 
remains: The theoretical findings give no indication whatsoever as to how to choose 
the numerical constants in the thresholds that feature in shrinkage- or Lepski-test-based 
methods. It has become a common practice that thresholding constants are chosen 
according to simulation results where simulations are drawn as if the true underlying 
signal is very simple (say, uniform or piecewise constant). This practise has not had any 
general theoretical corroboration until recently Spokoiny and Vial [H] gave, in a simple 
Gaussian regression model, a certain justification based on the idea of 'propagation'. 
The results in [14] are heavily tied to the simplicity of the model used, in particular 
to the strong Gaussianity assumption employed, and to the fact that pointwise loss is 
considered. In the present paper we show how the ideas of [13] generalise, subject to 
some nontrivial modifications, to nonparametric density estimation. A key idea in the 
proofs in \XMi translated into the density estimation context, is to replace the sampling 
distribution by a locally constant product measure. The 'transportation cost' of this re- 
placement is easy to control in the Gaussian setting of [14J, but in the density estimation 
case the fiuctuations of the likelihood ratios between the unknown sampling distribution 
and relevant locally constant product measures do not obey a Gaussian regime, but turn 
out to be of Poisson type, so that the 'Gaussian intuitions' of [H] could be entirely 
misleading. We show however that the main information theoretic idea of [H] remains 
sound in this Poissonian setting as well: We use a Lepski-type procedure to construct 
jn{x), and we show that if we compute sharp thresholds for this procedure as if the 
true density /o belonged to a family T of locally constant densities, then the resulting 
estimator is spatially adaptive in sup- norm loss. In contrast to the results in [H], the 
rates of convergence we obtain for the risk of the final estimator are exact rate-adaptive. 

While the techniques and results of this paper generalise in principle to more complex 
estimation problems that involve in particular adaptation to higher degrees of smooth- 
ness, we prefer to stay within the simpler setting of Haar wavelets, which allows for a 
clean exposition of the main ideas. 
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2 Uniform spatial adaptation using propagation methods 



We will use the symbol \\g\\T to denote the supremum supjgj^ \9{t)\ of a function g over 
some set T, but we will still use the symbol H^Hcxd to denote supj.g]g \g{x)\ if no confusion 
can arise. 

For any j € N, we define a dyadic partition of (0, 1] into 2-'-many disjoint subintcrvals 
by setting Ij^k = , {k + 1)2~^], k = 0, . . . ,2^ — 1; and for < x < 1 we denote by 

Ij^k(x) the unique interval containing x. For j G N, /c = 1, . . . , 2-' — 1, let Vj^k be the space 
of all bounded density functions on M that are constant on Ij^k. Via the local projections 

' '^^ ^ijM-)^^y^'^y if^G/j-fe(^), 

f{z) otherwise, 

we map any bounded density / onto Vj^^^^y (Note that Kj^^if) is indeed a density since 
Kj^xif) and / assign the same probability to the interval Ij^-^^^y) For / G Vj^^ S'ld j' > j 
we clearly have Kj'^xif) = /• 

2.1 Estimation procedure 

Let X,Xi, ...,Xn be i.i.d. with bounded density /o : M ^ [0, oo), n > 1. We wish to 
construct a single estimator which estimates fo{x) in an optimal way, uniformly so for 
points X in the interval (a, 6]. We shall take without loss of generality (a, 6] = (0,1], 
and we shall assume throughout that /o is bounded away from zero on (0, 1]. Let 

K{x, y) = (f){x — k)(f){y — k) be the projection kernel based on the Haar wavelet 
(f) = 1(0,1]- We shall write Kj{x,y) = 2^ K{2^ x,2^ y), and the associated linear density 
estimator is the dyadic histogram estimator given by 

1 " 

fnU,x) := - '^Kj{x,Xi). 
" i=i 

Wc make the important observation that Effn{j,x) = 2^ Pf{Ij k(^j.^).^ which directly fol- 
lows from the identity Kj (x, y) = 2^ Ij^ ^^^^ (j/). If / is constant on Ij^k{x) this in particular 
implies Effn{j,x) = f{x). (In other words: for any locally (at x) constant density / the 
bias of /n(i, x) equals zero if the resolution level is chosen fine enough.) 

We finally note that the estimator fn{j,x) by construction only depends on data 
points falling into Ij,k{x)- This amounts to n2~^ being the 'effective' sample size for 
estimating fo at x. 



4 



2.2 Local choice of the resolution level 

We fix jmax := Jmax,n ^ N Satisfying 2^-'™^'' > (logn)^/n for some d > 0. For thresholds 
Cn to be specified below, and for J G N, J < jmax and < x < 1, we define 

j„( J, x) = minjj G N, J < j < jmax : 

V^\fn{j',x) - fn{j,x)\ < CrrVUU , x) for ah /, j < J < jmax I (1) 

as well as 

Jn{x)=Jn{0,x). (2) 

(If the condition in ([1]) is not met for any j, J < j < jmax, we set jn{J, x) = jmax-) Given 
the locally variable resolution level jn, we define the family of nonlinear estimators 

fn{J,x) := fn{jn{J,x),x), fn{x)-=fn{jn{x),x), XE[0,1]. (3) 

These are estimators for /o(x) based on a locally variable resolution level depending on 
x, and they are density-analogues of the estimators introduced in [I^ in the context of 
the Gaussian white noise model. Note that by construction jn{x) is a step function in 
X. Introducing the parameter J will be useful in what follows - effectively, fn{J,x) is 
a nonlinear estimator based on a search over the resolution levels j > J that stops at 

Jmax- 

2.3 Threshold choice by propagation 

One of the main challenges for all adaptive procedures is the choice of the thresholds Cn 
used in the tests defined in ([1]). Define the standardisation 

-7== if/„(j,x)>0; 
otherwise. 

We suggest to choose the thresholds in such a way that the following condition is satisfied: 

Condition 1 LetFj ^ be CLTiy tvidflQUlciT (ITTdy of subsets ofVj j^, j ^ Jmax? 

1, and let k{m) he the unique k such that Ij^.^^^m ^ Ij,k- We say that the thresholds Qn 
satisfy the uniform propagation condition UP(a, J-'j^fc) for some fixed a > if for every 



1 

Sn{j,x) 
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n, every j < jmax, every m = 0, . . . , 2-'™=''' — 1, and every f G J^j^k{m) have that 



Ef sup max \ 



fn{j',x) - fn{j',x) 



Sn{j',x) 
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< — ^. — . (4 



(Note that since jn(j') > / we have that fnij',x) = imphes fn{j',x) = for the 
fuhy data-driven estimator fn{j')-, and so the error |/n(j',x) — /n(j',a^)| is then 0.) 
An interpretation of this condition can be given along the fohowing hnes: For < 
a; < 1 the class J^j^k{x) contains only densities / that can be exactly reconstructed on 
Ij^k{x) by / Kj{x,y)f{y)dy, so that the bias of the linear estimator /ra(j',x) equals zero 
locally. In particular, any choice of the resolution level finer than j' will only increase 
the variance without reducing the bias, and we would want jn{j',x) to detect that and 
equal, with large probability, j' . This property of jn will then be mirrored in the fact 
that fn{j', x) — fn{j', a^) = for every j' > j on an event with large probability, in which 
case the l.h.s. of dH) is exactly equal to zero. The quantity a/(n2^-^™'"') stands for the a 
priori expected tolerance for a probabilistic error of jn to detect the 'correct' resolution 
level on each interval Ij^^^^m in this 'no-bias' situation. 

The following lemma shows that Condition [1] is not empty and that thresholds Cn 
satisfying the uniform propagation condition exist. It shows furthermore that the thresh- 
olds can be taken to be of order ylogn and independent of /, which will be crucial in 
understanding the adaptive properties of fn below. 

Lemma 1 Let J-j^k equal Vj^k intersected with the set 

/:0<5< inf /(x), ll/lloo <M 

0<x<l 

for some fixed < 6,M < cxd. Then for every given a > there exists a numerical 
constant k > that depends only on a such that for any threshold choice 



Cn > /^Vlog 



n 



the uniform propagation condition \]P{a,J-j^k) is at least satisfied for n larger than some 
index that only depends on 5 and M. 

While Lemma [1] proves the existence of thresholds of the order -y/Iogn under the 
uniform propagation condition - a fact that will be seen to imply adaptivity of /„ below 
- it does not suggest a practical choice of Cn- Instead, this choice can be made by 
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direct evaluation of (jl]), as follows: Condition [T] only concerns the local error bounds 
over small intervals Ij^^^^m on which the function / is constant, which effectively means 
that it suffices to check this condition only for classes of densities which are constant 
on the interval of interest. The particular choice of the interval Ij^^^m is unimportant. 
Secondly, all quantities in Condition [1] depend on known quantities after / is chosen. 
By construction of the estimators /„ and /„ the random variable featuring in Q - we 
call it r - only depends on the number of data points falling into each of the (uniquely 
determined) j'-fine intervals containing Ij^^^^m- This observation allows for an easy 
computation of the l.h.s. of (jH) along the following lines: Fix < p < 1. Then, for any 
/ G J~j,k{m) satisfying 2"-'/ = p on Ij^k{m)i the number Z of observations falling into the 
interval Ij^k{m) is binomial B{n,p). Conditionally on Z = k, take /c-many independent 
random variables that are uniform on Ij^k{m) count the number of observations Vj' 
in each of the j'-fine intervals. Then compute /„, /„; and T. This shows that T does 
only depend on Vj', j < j' < jmax, and that the l.h.s. of dH is therefore equal to 
E[E[T{V,,...,Vj^J\Z]]. 

The practical choice of can then be obtained via a Monte Carlo simulation of 
dH by choosing as the smallest threshold for which is satisfied in the simulation 
for one specific interval Ij^^^^m uniformly over the class of all densities constant on this 
interval. Given jmax and a, this procedure has to be performed only for one fixed interval 
-^jmax.m, and then applies for every m simultaneously. 

2.4 Local small bias condition 

The idea behind Condition [T] is that we take 'idealised' classes of densities J- for which 
we compute sharp thresholds The danger arises that the true density /o may be very 
different from the elements in J^, which may lead to wrong thresholds (and inference). 
We have to assess the error that comes from replacing /o by an element from J^, in a 
neighborhood of a given point x. This can be fundamentally quantified in terms of the 
log-likelihood ratio between /o and its local (at x) approximand in J^. As we shall see, 
one of the deeper reasons behind the fact that propagation methods imply adaptation 
results is that this error can be related to the usual bias term in linear estimation. 

Condition 2 Given real numbers ^j^x, 0<x<l, jGNU {0} satisfying A// 2, < A^^^^. 
for every V > I, we say that /q satisfies the local small bias condition at x £ (0, 1] and 
with Aj^x = ^j,x{fo) if 

fo 

VarK^.,(/o)log^r^TT < ^jAfo) 
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for all j G N. 



The local 'cost' of transporting a product measure HILi foi^i) to ]Xi=i ^j,xifo)ixi) 
can be quantified by n times the variance featuring in the above condition, and we shall 
have to restrict ourselves to resolution levels j for which this transportation cost is at 
most a fixed constant times the logarithm of the sample size n. The smallest resolution 
level for which this is still the case will be defined as j*{x): More precisely, for some 
fixed positive constant A, define the local resolution level 

fix) := i*(x,n, A,/o) = min{i G N : j < j^^ax, nAj-^.(/o) < Alogn} . (5) 

While this is an information-theoretic definition of j* , a key observation of this subsection 
is that it has the classical 'bias-variance' tradeoff generically built into it for suitable 
choices of Aj^xifo)- 

Lemma 2 Suppose fo is bounded by some finite number M > and that 

inf fo(x) > 6>0. 
0<x<l ^ ' ~ 

Then /o satisfies Condition [H with 

M 



A,Mo) = ^2-^\\fo-K,Mo] 



I oo • 



Proof. First, observe that Kj^xifo) is bounded by M and bounded below hy 6 > 
0. Then, using that Kj^xifo) coincides with fo outside of Ij^k{x) ^^'^ the inequality 

I log X — log yl < m.ax{x^^ ,y^^)\x — y\, we get 

VarK,,.(/o) log 



2 



< I max {fo{y)-^K,,Mo){yr^) (My) - Kj,Mo){y)fK,M)iy)dy 

< ^ / (foiy) - K,Mo){y)fdy < ^2~^\\K,M^) - /oii^. 



The lemma shows that the quantity (n/ log n)Aj^^(/o) can be viewed as the square 
of the 'bias divided by the variance' of linear projection estimators for fo{x). Hence, 
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to choose the smallest j < jmax such that (n/ log n)Aj^^(/o) is still bounded by a fixed 
constant A means to locally balance the 'variance' and 'bias' term in the nonparametric 
setting. 

To be more concrete, let us briefly discuss what this means in the classical situation 
where the bias is bounded by local regularity properties of the unknown density /q. 
Since we are interested in spatial adaptation, we wish to take locally inhomogeneous 
smoothness into account by appealing to local Holder conditions: Let < t < 1 and let 
us say that a function (7 : M — t- M is locally t-Holder at x G M if for some 77 > 

\g{x + m) - g{x)\ 
sup j — rj_ < 00. 

Define further a 'local' Holder ball of bounded functions 

C(t,x,L,r/) := L:M^M, max( ll^lloo, sup + " ] < l\ . 

[ y 0<\m\<r] J J 

Condition [2] then has the following more classical interpretation in terms of local smooth- 
ness properties of /q: 

Lemma 3 ///o G C{t, x, L,r]) for some < t ^ Ij then the local bias Wfo — -^j,x(/o)||cxd 
is bounded by c2~-^* for some constant c = c{t,L,r]). Furthermore, if 

inf /o(x) > (5 > 0, 



then Condition [21 is satisfied with 



A,,.(/o) = c^|2-^(2*+i). (6) 



Proof. Let y E Ij,k{x) be arbitrary. Then, using the substitution 2^ z = 2^y 



\fo{y) - K,M){y)\ 



2^ / (/o(y) - fo{z))dz 

< l/o(y) - /o(x) + /o(x) - /o(y - 2-H)\du 

< 2|/o(y) - fo{x)\ + C |/o(x) - fo{y - 2~^u)\du 
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By definition of x,y, Ij i^f^r^-^ we have |y — 3;| < , and also \y — — x| < 2~^^^ by 
the triangle inequality, so that for 2"-'"*"^ < tj the last quantity is bounded by co2~-'* in 
view of /o G C{t, X, L, rj). If > r]/2, then the quantity in the last display can still be 
bounded by 6||/o||oo < 6L, so that choosing ci = 6L(2/r/)* establishes the desired bound 
for c = max(co,ci). To prove the second claim, apply Lemma[2l ■ 

Using the bound from the last lemma to verify Condition [21 we see that, by definition 
of and for /q G C{t,x, L,7]), 



'n2-i*W 
logn 



n 



log n 



t 

2t+l 



(7) 



is the locally (at x) optimal adaptive rate of convergence, so that the local small bias 
condition constructs a minimax optimal resolution level j*{x) at every x G [0, 1]. 

2.5 Main results 

We now state the main results, starting with the following 'oracle' inequality. Note that 
the oracle fnij*{x),x) is not an estimator in itself as it depends on unknown quantities. 

Theorem 1 Let fn{') be the density estimator defined in ([3]) with thresholds that 
satisfy the uniform propagation condition UP(a, J~j,k) for some J~j,k- Suppose /o satisfies 
Condition\^for every < x < 1, and let j*{x) he as in Then we have 



for any U satisfying 



Efo sup 

0<2:<1 



' n 



log n 



fn{x) - fn{j*{x),x) 



Sn{j*{x),x) 



< 



Cn 



-== + J^n^^^^ 
Vlogn V ^ 



U > sup 

0<x<l 



log 



/o 



Kj*(x),x{fo) 



(8) 



(9) 



If Cn = 0(-v/logn) - as follows under the conditions of Lemma [T]- and if one chooses 
A < 1/2, U as in the remark below, then the r.h.s. of ([8]) is 0(1) as n tends to infinity. 
Theorem [1] thus implies that the estimator /„ with resolution levels chosen by the prop- 
agation approach is close to the linear 'oracle estimator' evaluated at the locally optimal 
resolution level j*{x), and this uniformly so on (0, 1]. 
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Remark 1 // J-'j ^ is as in Lemma [I] and /q is bounded by M and bounded below by 5, 
we may apply Lemma\^{using 2"-'™^'' > dilogn)"^ /n) to obtain the bound 



log /o ^ (i /o - -^j*(x),x(/o) \ ^ A _^ 
Kj*(x),xifo) V (fo) J ~ \ 



A 



which tends to zero as n tends to infinity. 

Our results then imply the following uniform spatial adaptation result: 

Theorem 2 Assume that /o is bounded by M and satisfies info<a;<i foix) > (5 > 0. Let 
fn{') the density estimator from ([3]) with thresholds Cn = 0{\/Togn) that satisfy the 
uniform propagation condition \]F{a, J-j^k) for J-j^k o,s in LemmaUl Let j*{x) be as in 
([5]) with A < 1/2 and with Aj^x o,s in Lemma\^ Then 



/re2-J*(^) - 

sup \ - fn{x)-fo{x) =Opr, (1). (10) 

0<x<l V log^ 

Thus the fully data-driven estimator /„ for /o achieves the locally optimal risk of the 
'oracle' based on j*{x), uniformly at all points in (0, 1]. If j*{x) - with A < 1/2 - is 
based on as in Lemma [3l then (jlOp holds true and the 'oracle' rate is the adaptive 
locally minimax rate of convergence at every < x < 1 where /o is locally t-Holder with 
< t < 1, see the discussion in Section 12.41 surrounding d?]). This means that at any 
given point x our estimator is rate-adaptive to local Holder smoothness (with the usual 
log n penalty for adaptation) . 

One may ask further if spatial adaptation in the minimax sense occurs uniformly for 
every a; G (0, 1]. A consequence of Theorem [5] is the following. 

Theorem 3 Suppose the assumptions of Theorem\^ are satisfied and that the true den- 
sity /o lies in C{t{x),x, L{x),ri(x)), < x < 1, for some t(-), L(-), r/(-) that are bounded 
and uniformly bounded away from zero on (0, 1]. Let j*{x) be as in ([5]) with A < 1/2 
and with Aj^x cls in Lemma\^ Then 



sup 

0<x<l 



^ , t{x)/(2t{x)+l) 



log n 



fn{x) - fo{x) 



The assumptions on the functions t, L, rj need discussion. For densities that locally 
look like \x — 3;^!"™ we would wish to choose t{x) equal to their pointwise Holder 
exponents t{xm) = and t{x) = 1 otherwise, but then rj is not uniformly bounded 
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away from zero for points x — t- x^a- However, Theorem [3] holds for any choice of the 
functions t,L,r] for which /q satisfies /o G C{t{x),x, L{x),r]{x)), < x < 1. In other 
words, in the above example we can choose t{x) = am on the interval (x^ — ?70) Xm + i]o) 
and t(x) = 1 otherwise, where rjo is some arbitrary lower bound for ri{x). This comes at 
the expense of not being adaptive near x^, i.e., for x G ixm — 'nojXm+r]o)\{xm}, which is 
sensible as we cannot expect adaptation for points x arbitrarily close to Xm from a finite 
sample. Inspection of the proofs (particularly the dependence on r] in Lemma [3]) shows 
that, for fixed n, the above theorem holds for densities C(t(x), x, L(x), r/(„)), < x < 1, 
where r/(„) can be taken of order n~^/^, the binwidth corresponding to the maximal 
smoothness t = 1 one wants to adapt to in our setting, and this is again reasonable: 
Holder smoothness of /o in an interval [x it r„] where r„ = o(n~^/^) does not allow to 
control the bias at x with the locally optimal binwidth of order n~^^^. By the same 
arguments multifractal densities /o which change their Holder exponent continuously 
can be handled by taking t{x) piecewise constant on a partition of (0, 1] into bins of size 
of order n~^/^, the estimator achieving the local uniform minimax rate on each bin of 
the partition. 



3 Proofs 

3.1 Proof of Theorem [H 

A first idea is to use a moment bound, localised at any point x of estimation, on the 
log-likelihood ratio between /o and its approximand in Vj^k- 



Lemma 4 //, for fixed < x < 1, 



Var 



log- 



/o / I^logn 



< 



n 



for some < D < oo and every n G N, then, for every n G N, 

\ 2 



E 



n 



fo{Xi 



< n 



holds for any U satisfying 



U > 



log 



/o 

KjAfo) 



ill) 
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Proof. Since the Kullback-Leibler distance 



/C(/o,K,,.(/o)) = -EKUfo)^ogj^^^ > 



is non-negative, we have 



E 



KjMfo) 



n 



L K,>.(/o)(X,) 



< E 



by the i.i.d. assumption. Using the power series expansion of the exponential function 
and that the variables in the exponent are centered, one easily bounds the previous 
display by 

^ ^ 2De^^ logn ^i" ^ ^2De^Ui^g^ ^ ^2De^U 

n ^ 



Here is the proof of Theorem [T) We first note that Condition 2 allows us to take 
Aj^xifo) to be constant on the intervals Ij^k- Consequently, from ([5]) is then constant 
on every interval Ij,^^^^m, and we set 



sup j*{x) 



To prove the theorem, we split 



Efo sup 

0<x<l 



' n 



log n 



fn{x) - fn{j*{x),x) 



< Ef^ sup 

0<x<l 



+Efo sup 

0<x<l 
--■.I + 11 



2-i'{x) 



' n 



logn 



Sn{j*{x),x) 



fn{x) - fnij*{x),x) 



' n 



2-j*(x) 



logn 



Snij*ix),x) 
fn{x) - fn{j*{x),X 



Snij*ix),x) 



{in(a;)<j*(x)} 



according to whether jn{x) comes to lie below the local resolution level j*{x) or not. By 
definition of jn{x) in ([T]) one immediately has 



/ < 



Cn 

\/logn 
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About //: Define 



n2'i 

Dm = sup max \ / 

xei,^,^,^J^<j<jm.. V logn 



fn{j,x) - /n(j,x) 



(12) 



Using that on the event jnix) > j*{x) we necessarily have fn{x) = fn{j*{x),x), we see 
that 



// < Efg sup 

0<x<l 



• n 



2-j*{x) 



logn 



fnij*{x),x) - fn{f{x),x) 



< Ef max sup 



max 



Sn{i*{x),x) 

fn{j,x) - fn{j,x) 



fn2-i 
log n 



Sn{j,x) 



< 2-'™'''' max£'/-„5. 



fo'-'m- 



(13) 



We use the Cauchy-Schwarz inequality to bound 



fo'-'m 



j "' j '^rn{xi, . . . , X„) fo{xi)dxi ■ ■ ■ dXn 



i=l 
n 



J "' J '^rn{xi, ■■■ ,Xn)'[\ 



h{xi) 



\Kj*^^^{fQ){x,)fJ-^ 



WKj*^^x{fQ){xi)dxi ■■■dxr, 



E 



n 



fo{X^ 



by the square-root of the second moment of Sm under the 'idealised' density Kj*^^x{fo) 
times the square-root of the second moment of the likelihood ratio. (Here, x is any point 
in Ijn^^^^m-) Using Condition 1 and Lemma HJ we obtain a bound for the last term in 
(fT3l) of order 



2^-^- max Ef,Sm<J-n'^'''\ 
m y n 



which concludes the proof of the theorem. 
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3.2 Proof of Theorems [5] and g] 

We first prove Theorem [2) Clearly, 



sup \ — 

0<a:<l V log"- 



fnix) - Mx) 



< sup 

0<a;<l 



logn 



fnix) - fn{f{x),x) 



Sn{j*{x),x) 



Vfnij*{x),x) 



/n2~-''*(^) 

+ sup d— \fn{j*{x),x) - fo{x)\ . 

0<x<l V log"' 



The first factor of the first summand is bounded in probability in view of Theorem [T] and 



of Lemma[T]and the hypothesis Cn = 0(\/log n). The second factor of the first summand 
is also bounded in probability since 



sup max I /„ (j, x) - Ef^ fn (j, x) \ = op, (1) 

0<a;<lJ<Jmax 

by Proposition [21 using 2"-'™^'' > (i(logn)^/n, and since sup^j \Efgfn{j,x)\ < ||/o||oo < 
oo. It remains to prove that the second summand is bounded in probability, and we 
achieve this by bounding the moment 



/n2--?'*(^) 

^fo sup \ — \fn{j*{x),x) - Efjn{j*{x),x)\ 

o<x<i V i-ogn 



+ sup 

0<x<l 



2-j*(a;) 



' n 



logn 



Efjn{f{x),x) - /o(x)| 



'n2-J 



< £'f„ sup max 

o<x<ii<imax V logn 



|/„(j,x) - Efjn{j,x)\ 



+ max sup 



'n2-J*W 
log n 



Efjn{f{x),x) - h{x)\. 



The first term is bounded by a fixed constant using Proposition [2] below. Recalling the 
definition of jj^ from the beginning of the proof of Theorem [1] and choosing /\j^x{fo) 
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from Lemma [21 the second term is bounded by 



n2 



sup \Ef^Jn{3l^,x) - fQ{x)\ 



max 



m 




oo 



where x is any point in Ij^^^,m, and this completes the proof. 

We next prove Theorem [3) Using the hypotheses on t{-), L{-),r]{-), the proof of 
Lemma [3] shows that /o satisfies Condition [2] with 



< c' < oo, where c' does not depend on x. Using that t(-) is bounded below by some 
positive number implies that 



holds for n large enough (independent of x G (0,1]), so that j*{x), when based on 
Aj^xifo) as in (|14p . is asymptotically equivalent to the minimax optimal locally adaptive 
rate, uniformly so for all x. 

3.3 Proof of Lemma [T] 

The proof relies on Propositions [T] and [2] which are given below. Recall first from Sec- 
tion l2.1l that for / G Vj^k and j' > j we necessarily have Effn{j', x) — f{x) = for every 
X S Ij,k, so the bias at x G Ij^k is exactly zero, a fact we shall use repeatedly below 
without separate mentioning. Write 



A,,.(/o) = c'2-^-(2*(-)+i) 



(14) 



A logn 



n 




(15) 



16 



To treat the indicator, observe that 

{Jnif,^) = 1}^ [V^\fnil',x) - fn{l " h x)\ > CnVfn{l-l,x) for SOme > /} 
C J V^\fn{l', X) - EfUl\ X) + EfUl - 1, X) - fn{l - 1, X)| 



> Cn — I' >l\^ \ min \flJI~x) < — 



Observe that the first set is a subset of 



Vn2-'' \ fn{l',x)-Effn{l',x)\ > "^"Vj^ for some l' > /| 

U I Vn2-('-l) \fn{l -1,X)- Effn{l - 1,X)| > 



Cn/fM 



and that, using y > \/6y for y > S, the second set is contained in 



max \fnii, x) - Effnii,x)\ > 



C Jrnax|/„(£,x)-i?//„(£,x)| > ^^^^ 



1,k(m) 

sup > }:=B2; 

SO that {jn(j'; x) = /} C u i?2 =: -B, a set which does not depend on /, x or /. Hence, 
^{'jnij' 2-0=0 — uniformly in /, x, /, so that the quantity in (fT5]) is bounded from above 
by" ' ^ 

fn{l,x) - fn{f,x) 



\n2-y sr^ 

1 B sup max \ \ > 

xGl y>j V logn ^ 



Sn(j',x) 
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and therefore the second moment of (|15|) is bounded, using the Cauchy-Schwarz inequal- 
ity, by 



1/2 



sup max \ I > 

xGi i'>3 V logn ^ 



fn{l,x) - fn{j',x) 



Sn(.j',x) 



=: / X //. 



4,Prf 



We first bound //: By the triangle inequality and since the bias is exactly zero, this 
term is less than or equal to 



sup max \ — 
tgj j'>j \ logn ^-^ 



E 

i>j- 



fnihx) - Effn{l,x) 



SnU',x) 



+2 



In2-f 



sup max \ ; 



fn{j',x) - Effn{j',x) 



Sn{j',x) 



(16) 



4,Prf 



Define now S = {sup^^j. ^ minj/>j fn{j' ,x) > 6/2}. Note that, by definition of fn{j'), 
fnij',x) > implies /n(j',x) > 2^' /n. Then, for every 1 < p < oo. 



Ef sup max — — — - 



Ef \ sup max — — — -(I5 + l5<= 



< 



23p/2- 



+2P-in?'/2i?^l J sup mm !/„(/, x) - EfUj\x) + < ^ I 



< 



< 



+ 2?'-V/2prJ sup \f^{f,x)-EfUj',x)\>- 



23p/2- 
23p/2-l 

+2P-inP/2pr^ 
93p/2-l 

<^-^+2P-V/2cn-^l°g' 



sup 



^/^|/„(/,x)-i?//„(/,x)| 



> 



lofi 



n 



for large n in view of Proposition [T] (using that 2"-'™'"'' > (i(log so that this 

expectation is bounded uniformly in n by some constant ci{p,5,M). Using this, the 
Cauchy-Schwarz inequality and Proposition [21 the square of the first term in (fT6|l is less 
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than or equal to 



"'^^ J max 



/n2~' 1 
xEf I sup maxW- \fn{l,x) - Effn{l,x)\ sup max — — — 



8x 1/2 



< C222^— j^,, I I sup max J f-^lUl, x) - Effn{l, x)\ 
X I ( sup max — ^ — - | I < C32^-''""''/ 



max J 



and the same reasoning also implies that the second term in ()16p is less than or equal to 
some constant, so that we can conclude, using the lower bound of 2~^'^'^^, that 

// < C4n (17) 

for some fixed constant C4 that depends only on 5 and M. 

To bound /, we have the following: First, using Proposition [1] below, we see 



PTf{B,) = Pr;<;max^/;^||/„ffl-£;^/„.ffl||,^„> V ^ ■ 



< Dn'TD (18) 

for large n, with D only depending on M. Furthermore, using 2"-^'"'"' > d(log n)'^/n and 
Proposition [1] below, 

Pr/(S2) 

<PrJ sup V^ynii,x)-EfUi,x)\> 

for large n. Thus, choosing k large enough but finite depending on the choice of a, we 
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obtain for n large enough 



I xll < c^Dn I n-'is- + ) < 



a 



This completes the proof. 



3.4 Uniform-in-bandwidth bounds for Haeir wavelet density estimators 
and some consequences 

The following exponential inequality was used repeatedly in the proofs. 

Proposition 1 Let jmax e N such that 2'^^-^ > d{\ognf/n. Let I = 2-^{k + l)] 

for some j < jmax o,nd G Z, and suppose / : M — >■ [0, oo) is a density that satisfies 
11/11/ < M and 

inf f{x) >d>0. 

There exist constants Ci{d), C2{d) and an index n{S,M) such that for all n > n{S,M) 
and all C3 > C2{d), if 



Ci(d)Vll/IUlogn <u< C3||/||/Vn2-J-x, 



(19) 



then 



Pvf (sup max Vn2-/|/„(/,x) - Effn{j',x)\ > u] < De'"^ , 

L X&I i<r <jmax J 

where D only depends on C3 and M . 



Proof. Writing 



V^\fn{j\x)-Effn{j',x)\ 

n 

J2{K{2^'xyXi)-EfKi2^'xyX,)) 



2Jmax ■s/2j'-jn 



1=1 



we have to consider the supremum 



2\l sup 

n hen 



Y,{KXi)-Efh{X,)) 



i=l 
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of the (scaled) empirical processes indexed by the class of functions 



K{2^ x,2^ {■)):xGl,j'>j}. 



This class has constant envelope 1/2 since / < jmax and since sup^, ,^ y)| = 1. 
Furthermore, noting that K'^{x, y) = K{x, y) for every x, y, we have for h ^ T-L that 



Efh\X) 



2i -Jr, 



4 



K\2^ x,2^ y)fiy)dy 

2-^'{k{x)+i) 2-i-=> 
f{y)dy < -— 

2-3' k(x) 4 

Note further that is a VC-type class of functions by using Lemma 2 in [7J and a simple 
computation on covering numbers (including an obvious covering of the set [2"-'™="', 1] C 
[0,1]). Rewrite 



Pr/<jsup inax \^n2-i'\fn{j' ,x) - Effn{j',x)\ > u 



Prj < sup 

[h€H 



Y,iKX,) - Efh{X,)) 



i=l 



> 



u 



\/n2-Jn 



and apply expression (21) in [8], with 



4 4 



and 



A := < 



ci{d) 



log n 



if 



< 1, 



C2{d) otherwise 



for appropriate constants ci{d), C2{d) that only depend on d. ■ 

Proposition 2 Let jmax; I o-nd f be as in Proposition^ Then there exists a constant 
D{d, 6, M) such that for every 1 < p < oo we have 



Ef I sup max 

' X&I j<f<jrni 



In2-r 
log n 



\fnij',x)-EjU{j',x)\ <DK 



(20) 
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Proof. The proof follows from considering the same empirical process as in the proof 
of Proposition [U and using bounds for p-th moments of empirical processes indexed by 
uniformly bounded VC-classes of functions, e.g., the bound in the display following (21) 
in [8], with cP' and A as in the proof of Proposition [U together with Proposition 3.1 in 
0. ■ 

Acknowledgement. The authors would like to thank an anonymous referee for critical 
remarks, particularly on Theorem [31 
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